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Abstract 

The Advanced Encryption Standard (AES) is widely recognized as the most impor- 
tant block cipher in common use nowadays. This high assurance in AES is given by 
its resistance to ten years of extensive cryptanalysis, that has shown no weakness, 
not even any deviation from the statistical behaviour expected from a random per- 
mutation. Only reduced versions of the ciphers have been broken, but they are not 
usually implemented. In this paper we build a distinguishing attack on the AES, 
exploiting the properties of a novel cipher embedding. With our attack we give some 
statistical evidence that the set of AES-128 encryptions acts on the message space in 
a way significantly different than that of the set of random permutations acting on 
the same space. While we feel that more computational experiments by independent 
third parties are needed in order to validate our statistical results, we show that the 
non-random behaviour is the same as we would predict using the property of our 
embedding. Indeed, the embedding lowers the nonlinearity of the AES rounds and 
therefore the AES encryptions tend, on average, to keep low the rank of low-rank 
matrices constructed in the large space. Our attack needs 2 23 plaintext-ciphertext 
pairs and costs the equivalent of 2 48 encryptions. 

We expect our attack to work also for AES- 192 and AES-256, as confirmed by 
preliminary experiments. 



Introduction 

The Advanced Encryption Standard (AES) is widely recognized as the 
most important block cipher in common use nowadays [NatOl]. Its 256-bit 
version (AES256) can even be used at top secret level ([CGC05]). This high 
assurance in AES is given by its resistance to ten years of extensive cryptanal- 
ysis: AES has shown no weakness, not even any deviation from the statistical 
behaviour expected from a random permutation. Only reduced versions of 
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the cipher have been broken, but they are not usually implemented (see e.g. 
[RST10], Section 2). 

For a high-security cipher it is essential that nobody can distinguish its 
encryption functions from random functions. It is not enough that the encryp- 
tion function associated to a key cannot be distinguished from a random map 
(single- key attack), or that the encryption functions associated to related keys 
cannot be distinguished from a set of random maps. A high security cipher 
must behave so randomly that it must be impossible to distinguish (a random 
sample of) the whole set of AES encryptions from (a random sample of) the 
set of random permutations. 

In this paper we build a special kind of distinguishing attack on the AES. 
To be more precise, with our attack we give some statistical evidence that the 
set of AES-128 encryptions acts on the message space in a way different than 
that of the set of random permutations acting on the same space. In this paper 
we do not claim any other successful distinguishing attack, neither single-key 
nor related-keys. 

Our attack has a subtle theoretical justification. We are able to embed 
the AES (and actually also other translation-based ciphers) in a larger cipher, 
as explained in full details in [RST10]. This embedding is designed to lower 
the non-linearity of the AES rounds. The decrease in the non-linearity should 
be noted by analysing the ranks of some matrices (similarly to a Marsaglia 
Die-Hard test [NISOO]). While we feel that more computational experiments by 
independent third parties are needed in order to validate our statistical results, 
we show that the non-random behaviour is the same as we would predict using 
the property of our embedding. Indeed, we observe that the AES encryptions 
tend, on average, to keep low the rank of low-rank matrices constructed in the 
large space. This holds true apparently for all standard AES versions. 

Our attack needs 2 23 plaintext-ciphertext pairs and costs the equivalent of 
2 48 encryptions, thanks to a highly specialized rank-computation algorithm. 

The remainder of this paper is organized as follows. In Section 1 we sketch 
the internal structure of the AES, we explain our embedding and we treat some 
statistical models related to statistical attacks. In Section 2 we describe our 
attack strategy. In Section 3 we report our attack numerical results, including 
results on different AES versions. In Section 4 we discuss some computational 
matters, presenting a rank-computation algorithm. In Section 5 we provide 
our conclusions and several remarks. 



1 Preliminaries 



In this section we mainly follow the notations and the approach in [RST10], 
including viewing AES as a translation-based cipher. 
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1.1 An AES description 



In this subsection we recall the essential structure of the AES cryptosystem 
viewed as translation-based (for a more traditional approach see [DR02]). 

Let V = (F 2 ) r with r = 128 be the space of all possible messages (plain- 
texts or ciphertexts). Let /C = (F 2 ) f be the finite set of all possible keys 
(with I = 128, 192, 256). Any key k G K, specifies an encryption function <ft k . 
Let x G V be any plaintext. In order to obtain the corresponding cipher- 
text y = 4>k{x) G V, the encryption proceeds through N = 10, 12, 14 similar 
rounds, respectively (depending on £), as described below. 

A preliminary translation via (addition with) the first round key k^ ' in 
(F 2 ) r is applied to the plaintext to form the input to the first round (Round 
1). Other N rounds follow. 

Let 1 < p < N — 1. A typical round (Round p) can be written as the com- 
positiorQ 7Acr fc ( P ), where the map 7 is called SubBytes and works in parallel 
to each of the 16 bytes of the data (SubBytes is composed by two transforma- 
tions: the inversion in F 2 8 and an affine transformation over F 2 ); the linear map 
A : V — > V is the composition of two linear operations known as ShiftRows and 
MixColumns; a k ( P ) is the translation with the round key k^ p ' (this operation is 
called Add Round Key). 

The last round (Round N) is atypical and can be written as jXa^N) , where 
the affine map A is the ShiftRows operation. 

Obviously, we can see the linear operation A as a matrix M. We observe 
that the order of A is quite small and equal to 8: A 8 = ly. 



1.2 The embedding we are using 



We are interested in particular space embeddings where the vector space 
V = (F 2 ) r and W is the vector space (F 2 ) s , with s > r. We want to embed V 
into W by an injective map a and to extend 4>k G Sym(V) to a permutation 
4>' k G Sjm(W) as shown in the following commutative diagram: 

<t> k o 4>' k 

In order to do this, we have to define the permutation <p' k G Sym(W). We 
say that (p' k is an extension of (p k . Let r = mb, let s = 2 m bt. According to the 
setting described in [RST10, Section 4], the space embedding a : V — > W we 
consider is defined as follows 

a(v) = (e(y),e(Mv), ^M*" 1 ^) (1) 

where: 

1 Note that the order of the operation is exactly: 7, A, and then a k . 
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a) e : (F 2 m) 6 — > ((F 2 ) 2m ) 6 is a parallel map e(vi, . . . ,v b ) — (e'(vi), . . . , e'(v b )); 

b) the map e' : F 2 m — y (F 2 ) 2m is defined by means of a primitive element r\ 
of F 2 m as 

e'(0) = (1,0^^0) e'(^) = (0,...,0, 1 ,0, . . . , 0) VI < z < 2 m - 1 . 

c) the matrix M in GL((F 2 ) mfe ) has order t. 

Moreover, for byte-oriented Mixing Layers, i.e. if M £ GL((F 2 m) 6 ), the 
following bound has been proved as Proposition 4.2 in [RST10]: 

dim F2 ((Im(a))) < 2 m bt - (bt - 1) - mb(t - 1). 

Let M : V ->• V be the MixingLayer of AES. The map a : V ^ W we 
propose for AES is defined as follows 

a(v) = (e(v),e(Mv),...,e(M 7 v)), (2) 

where: e : (F 2 ) 128 -> (F 2 ) 4096 , e' : F 256 -> (F 2 ) 256 , M e GL((F 2 ) 128 ). 

We have t = 8, 6 = 16 and m = 8. In Fact 4.4 ([RST10]) we determined the 

dimension of (Im(a)), for a in (2), using the fact that M e GL((F 256 ) 16 ): 

dim F2 ((Im(a))) = 2 m 6t - (&i - 1) - mb(t - 1) = 31745. 

The encryption <pk is the composition of Add Round Key, Subbytes and Mix- 
ingLayer. So the only part of <f>' k which is not linear is the SubBytes operation. 

Remark 1.1. The goal of our (f>' k construction is to have the non-linearity of 
SubBytes decrease. 

1.3 On randomness and statistical distinguishers 

When a statistical test on data from a cryptographic algorithm is per- 
formed, we wish to test whether the data "seem" random or not. It is impos- 
sible to design a test that gives a decisive answer in all cases. However, there 
are many different properties of randomness and non-randomness, and it is 
possible to design tests for these specific properties. An example of a set of 
tests is the NIST Test Suite [NIS00]. It is a statistical package consisting of 16 
tests that were developed to test the randomness of (arbitrarily long) binary 
sequences produced by cryptographic random or pseudorandom number gen- 
erators. These tests focus on a variety of different types of non-randomness 
that could exist in a sequence. For example, the Marsaglia "Die Hard" test 
consists of determining whether the statistics of ranks of (32 x 32) matri- 
ces over F 2 , constructed with bits coming from the sequence, agrees with the 
theoretical distributions. 
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We are going to introduce three cryptanalitic scenarios where such test 
can be applied. They are called "distinguishing attacks" or "distinguishers" . 

Generally speaking, distinguishing attacks against block ciphers aim at 
determining whether a given permutation corresponds to a random permuta- 
tioJE or to a (f)^. Of course, there is always a distinguishing attack against 
any block cipher, since |/C| < +00, and so a brute-force key search will yield a 
distinguishing attack of average complexity 2 l ~ l (where £ is the key length), 
but we are interested in attacks costing significantly less. 

Let mi, • • • , m>N be some plaintexts, let k be any key. We denote by % any 
random permutation in Sym((F 2 ) r ) and by ipk the encryption function for the 
key k; we have to consider the following situation, where one black box is 
involved and it contain^ either 0& or tt. 




rrii 




TV(m,; 



A single-key distinguishing attack on a cipher C is any algorithm A able to 
distinguish the ciphertexts {cj}i<j<Ar from the random ciphertexts {cj}i<,<N, 
using some information on the plaintexts. There are two main variants: the 
chosen-plaintext and the known-plaintext. In both, formally A takes as input 
a set of pairs {(mi, Cj), . . . , (mjv, cat)} where 

• either q = q VI < % < N, 

• or Q = Ci VI < i < N, 

and returns as output "true" or "false" : 

• A outputs "true" if and only if q = q, V« s.t. 1 < i < N. 

• A outputs "false" if and only if q = Cj, Vz s.t. 1 < % < N. 
The difference between the twc0 variants: 

• Chosen-plaintext: A can decide the plaintexts and obtain the correspond- 
ing ciphertexts. In this case, such plaintexts are often chosen "related", 
i.e. satisfying some additional prescribed mathematical relations; 

2 a permutation chosen uniformly at random from the set of all permutations. 

3 A weaker form of distinguisher assumes that the black box contain 4>k or it with 
the same probability [Luc96]. 

4 There are other ways to consider the plaintexts, according to the possibilities and 
the capabilities of Eve. 
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• Known-plaintext: A cannot decide the plaintexts and we can only assume 
that A knows a certain amount of pairs (plaintext, ciphertext). In this 
case, the plaintexts are often supposed random. 

If we have some related keys k±, . . . , k T , we can describe a second cryptana- 
litic scenario. A related-key distinguishing attack on a cipher C is any algorithm 
A able to distinguish the ciphertexts {q^} from the random ciphertexts {<iij}, 
as in the following scheme. 





Remark 1.2. In this model, A knows additionally some mathematical relations 
between the keys used for encryption, but not the key values. Both the single 
key scenario and the related-key one describe a hypothetical situation, very 
difficult to reach in practice. Yet, a very secure block cipher must resist in 
both scenarios. 



There is another scenario where a distinguishing attack can be mounted. 
This scenario is less common and we have not found an established name in 
the literature for it, so we will call it a random-key-sample distinguisher. As 
in the related-key scenario we consider some keys ki, . . . , k T , some plaintexts 
{mi, . . . , rriiy} and the corresponding ciphertexts {q j}. The difference is that 
now we consider the keys as a random sample from the keyspace. This is a 
realistic assumption, because when the session key is changed during trans- 
missions a new (pseudo)-random key is negiotiated between the two peers. 
Formally, A behaves in the same way as the related-key algorithm, being able 
to distinguish the set of actual encryptions {qj} from a set of random vec- 
tors {q j}. Clearly, also for the random- key-sample scenario we could have a 
chosen-plaintext variant and a known-text variant, although it is rather un- 
likely that a known-key attack can be succesful (we would have both random 
keys and random plaintexts). 

Our attack in the next section is of the third type. We use a strategy 
similar to that of the well-known Marsaglia test. 
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2 Strategy description 



In this section, a is our embedding (2). 
We recall that dimp 2 ((Im(a))) = 31745. Let a 1 , 
distinct) vectors in V. Let V = {a 1 , a 2 , . . . ,a 



, a 



31745 



31745 



be (not necessarily 
}, so \V\ < 31745. We con- 



struct the (31745 x 2 15 )-matrix D such that the i-th row is the image of the 
map a applied to the plaintext a\ as in (3). 



/ 



D 



a{a}) 
a(a 2 ) 



\ 



( e{a l ) eiMa 1 ) ••• e(MV) \ 



e(a 2 ) e(Ma 2 ) 



s(M 7 a 2 ) 



(3) 



\a(a 3174b ) J ^(a 31745 ) e(Ma 31745 ) ••• e(M 7 a 31745 ) J 

Let M be the set of all such matrices. Clearly, we have 
|^| = (|y|) 31745 = (2 128 ) 31745 . 

We note that the weight of any row is bt = 128. 

What is the rank of D if V is random in Im(a)? 

The probability that a v x n random matrix [y < n) with entries in F 2 has 
rank exactly s is significantly greater than the probability of having rank equal 
to v — 1 or v — 2 or less. On the other hand, for a square nxn random matrix 
in F 2 the rank n — 1 is the most probable. However, the most likely rank for D 
as in (3) is not 31745, although 31745 < 2 15 , because our construction imposes 
specific constraints, for example on the row weight. Let dj^ denote the total 
number of matrices in Ai and let ^31743 denote the number of all matrices in 
Ai with rank less than or equal to 31743. In [RST10] we have computed the 
expected rank statistics for D. In particular, a direct consequence of Theo- 
rem 3.19 in [RST10] is the following corollary: 



Corollary 2.1. 



d 



31743 



d 



= 0.1336357. 



M 



Our attack is a random- key-sample distinguisher with chosen plaintext, as 
detailed in the remainder of this section. 



We choose 2 16 plaintexts obtained by taking 

S={v = (v 1 ,..., v w ) I v e (F 256 ) 16 , Vi = 0, 1 < i < 14}. 
Clearly, \S\ = (2 8 ) 2 = 2 16 . 

Remark 2.2. We order (F 2 ) 8 following the order of the binary representa- 
tion. For example, since (00000010) ^ 2 and (00001100) ^ 12 we have 
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(00001100) > (00000010). We order ((F 2 ) 8 ) 2 using the lexicographic order- 
ing, induced by the previous order: (a, b) > (a', b') if and only if either a > a' 
or a = a', b > b'. Once chosen an irreducible polynomial p G F 2 [x], with 
deg(p) = 8, we can identify F 256 with (F 2 ) 8 and so we can use the above 
orderings to order both F 2 56 and (F 256 ) 2 . 

Following the previous remark, we can write S = {v 1 , . . . , v 2 "'} where 
v l+1 > v l for all i. In other words, S is an ordered set of 2 16 vectors. 

We now describe an algorithm, that we call B, that takes in input an 
ordered set S = {i; 1 , . . . , i> 2 } of 2 16 vectors in V and that outputs a list 
of natural numbers r , . . . , r 31 74 5 computed as follows. We construct a first 
matrix D starting from {v 1 , . . . , w 31745 }. We compute the rank of D and we 
store the value. We repeat this procedure with {v 2 , . . . , ij 31746 } and so on with 
{v k+1 , . . . , -y fe + 31745 } ; where 2 < k < 33791. In total, we compute the rank of 
2 16 — 31745 + 1 = 33792 matrices. We define as the number of these matrices 
with rank j (for < j < 31745). 

We applied algorithm B to S and, since the rows of these matrices are 
strongly related (they share the same first 14 bytes), we expect the corre- 
sponding ranks to be significantly lower than the most probable ones (see 
Subsection 3.1 for details). 

We can apply algorithm B to 4>k(S) and to n(S), where 7r is any random 
map. We would like to use the two output lists to distinguish between <pk 
and 7r, but we are not able to do it. Instead, we choose a number r and we 
do two different operations. In one case, we apply B to <f>ki(S) for r random 
keys ki, . . . , k T . In the other case, we apply B to TTi(S) for r random maps 
7Ti, . . . , 7r r . In practice, we apply B to r random ordered sets, each containing 
2 16 distinct vectors. 

Finally, we use the output lists to distinguish between {^} and {^i}. 

We expect that the behaviour of the ranks coming from the encrypted 
matrices is distinguishable from the theoretical distribution, and in particular 
that these ranks are lower. On the other hand, we expect that the ranks coming 
from the random matrices follow the theoretical distribution. The results are 
reported in Section 3. 

3 Numerical results 

To mount the attack successfully we need to choose r small enough to 
allow for a practical computations and large enough to overcome the effects 
induced by the variance in the random distribution. 

Since we computed bunches of 10 random keys (and random maps), we 
observed that the values coming from the random maps may be distinguishable 
(from the expected distribution) if we consider up to 50 maps. However, with 
70 maps (or more) the random maps become undistinguishable, especially 
compared to the drastic values obtained by the encryptions. 
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Starting from a sample of 70 matrices, we report the obtained rank values 
corresponding to r + . . . + r 31U3 and r 317U + r 31745 : 



Rank 


Random AES Expected 


> 31743 
< 31743 


2049671 2047430 2049333 
315769 318010 316107 



Now, we apply the x 2 test between 

(1) Random and Expected, — > P value equals 0.51; 

(2) AES and Expected, ->■ P value equals 0.0003. 

The lower the P-value, the higher the probability that the observed data 
do not come from the theoretical distribution. It is customary in Statistics to 
consider 0.05 as a threshold. Since the value for random data is 0.51 and that 
for AES-128 is 0.0003, we may safely assume that, with high probability, the 
ranks observed for AES-128 do not come from a random sample. 
Besides, apart from the threshold, the ratio between the two P values is re- 
markable. And the difference between the AES-128 ranks and the theoretical 
distribution is exactly where we expect it to lie: in a significantly higher num- 
ber of low-rank matrices. 

In the following figure, we report the results of two samples coming from 
4>k(S) (the 70 red dots) and from ir(S) (the 70 blue circles). First, we ordered 
our samples according to the number of low-rank matrices: on the left the 
samples with a smaller number and on the right those with a larger number. 
Then we plotted vertically this number. The horizontal line corresponds to the 
expected value for low-rank matrices. It should be apparent from the picture 
how the two groups of values are separated. 



expected value 
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3. 1 Furher remarks 

In this subsection we report on special applications of our algorithm B. 
Some results are predictable: 

• since our plaintext set S contains highly correlated vectors, we would 
expect that B outputs very low ranks when the input is S itself; indeed, 
in this case the output is r 469 o = 33792, that is, we get exactly 33792 
ranks equal to 4690. Actually, it is not difficult to prove that the di- 
mension of the vector space generated by a(S) is 4821, with arguments 
similar to those of the proof of Propostion 4.2 in [RST10]. Therefore, we 
would expect our 31745 vector sample to form a matrix with a lower rank 
(4690 < 4821); 

• when we apply one round of AES (with any key) to S, algorithm B 
outputs again r469o = 33792. This may come as a surprise, but it is 
easily explained in our framework. One roun means, in order, one 
key addition, one S-Box, one A and another key addition. Thanks to 
properties of the embedding a, all the above operations are linear, except 
for the S-Box (see Proposition 4.5 in [RST10]). However, the S-Box in 
this case does not change the type of subspace. Indeed, after the fist key- 
addition we have all vectors sharing the first 14 coordinates and the last 
two are free to be any pair. Since the S-Box acts in parallel, it does not 
change this situation and so the dimension of the whole space and the 
ranks of our matrix remain unchanged; 

• things change when we apply two rounds of AES; the reason is that the 
MixColumns changes the structure, since it intermixes four bytes at a 
time; it is true that the MixColumns in the first round does not change 
the rank, but the change in the structure is fatal to the rank when the 
S-Box of the second round is applied; indeed, our experiments shows that 
B outputs in this case r 2 0548 = 33 7 92. Again, this lower value is justified 
by the dimension of the 2-round encryption of S, which is 20679; 

• similarly, the rank raises with the application of three rounds: B outputs 
in this case r 31661 = 33 7 92 and the dimension of the 3-round encryption 
of S is 31681; 

• when we apply four rounds or more, we get values near to the random 
setting (and the dimension of the subspace is 31745, since it coincides with 
the whole space (Jm(a))); in this sense, we could say that the diffusion 
of AES is working from 4 rounds and above, as it is usually agreed. 



5 In the translation-based notation we are performing Round and Round 1 
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Some results are largely unexpected. For example: 

• let us consider two random maps 7Ti(S), ^(S) and two encryption func- 
tions (fiki(S) and <fik2(S) {k\ 7^ k 2 ). Now, we apply algorithm B to the 
four corresponding sets and we obtain the following rank distributions. 



Rank 


Randomi 


Randoni2 


AESi 


AES 2 


Expected 


31745 


9782 


9467 


9765 


9554 


9759 


31744 


19482 


19765 


19525 


19569 


19517 


< 31743 


4528 


4560 


4502 


4669 


4516 



As before, we apply the x 2 test between 

(1) Randomi and Expected, — > P value equals 0.928; 

(2) Random 2 and Expected, — > P value equals 0.002; 

(3) AESi and Expected, — > P value equals 0.97; 

(4) AES2 and Expected, — > P value equals 0.008. 

We note that case (1) and case (3) are statistically indistinguishable 
from the expected distribution, while case (2) and (4) appear statisti- 
cally distinguishable from the expected. In other words, if we apply our 
test only to one ke^\ it fails badly, because it would distinguish with 
probability 0.5, that is, it outputs a random value! The reason for the 
single-key failure lies in the large variance among our samples. This is 
why, in order to overcome this problem, we had to consider more keys: 
the right r, large enough to highlight the statistical differences and still 
small enough to compute efficiently the corresponding ranks. 

• When we mount our attack we have the freedom to consider for the x 2 
test whatever combination of the ranks Tj we like, as long as random 
samples are not distinguishable and AES samples are. We tuned our test 
to consider only two values (ranks lower than 31743 and those higher). 
Two other choices are possible. 

One would be to look only at even smaller ranks, such as "ranks lower 
than 31740" (and those higher). We have discarded this option because 
smaller ranks are very infrequent and we would then need a very large 
sample in order to validate our tests. 

The second choice is to consider more ranks, for example three ranks, as 
in the following table. We considered a total sample having 70 elements 



that is, if we try to mount a single-key attack. 
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Rank 


Random AES Expected 


31745 
31744 
< 31743 


684191 682317 683111 
1365480 1365113 1366222 
315769 318010 316107 



According to the \ 2 test we have that: 

(1) Random and Expected, — >■ P value equals 0.29; 

(2) AES and Expected, — > P value equals 0.0013. 

So again we would distinguish between random and AES-128, but with 
more difficulty. This can actually be explained a posteriori: it is true that 
our embedding would induce less maximum-rank matrices (and the num- 
bers confirm this: 682317 < 683111), but they might become 31744-rank 
matrices and so add to the most noisy value 7 1, and indeed the 31744-rank 
matrices in the AES-128 sample are only slighty less than the expected. 

We did not report on experiments on the other standard versions of AES 
(AES-256 and AES-192), but our preliminary tests seem to indicate that our 
test works well also in those cases, with only a slight worsening of the P 
value ratio. Indeed, our strategy is independent from the key-length of AES, 
since our approach is actually independent from the key-schedule and only 
marginally dependent on the round numbers. 

4 Computational effort 

The algorithm developed to compute the ranks for the attacks is special- 
ized for F 2 and is described in [BRIO]. Here we provide a sketch. 

Since the matrix is rather large (circa 2 15 x 2 15 ), we must keep the heav- 
iest part of the computation within the cache. However, some steps on the 
whole matrix cannot be avoided, so we need a strategy that keeps these to a 
minimum. In particular, we may need both column and row permutations. In 
the below description, we assume that we do not need them. 
Remark 4.1. To be able to avoid permutations is equivalent to having per- 
formed a preprocessing such that each upper-left square block is square and 
non-singular. Of course this cannot be done a priori, but we stick to this for 
clarity's sake. 

Without the technicalities pertaing permutations, our algorithm reduces 
to a variant of a recursive LU decomposition ([GV96]). Let the matrix M be 
in (¥ 2 ) Rxn - If the whole M does not fit into the cache, we partition M into 
four blocks of approximately the same size. 



random variable ^1744 has the largest variance. 
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Let Mi be the left-top block. If it does not fit, then we partition Mi 
similarly in four blocks and so on. Let A be the smallest block that does not 
fit. We will have 



(4) 



where An fits into the cache. Thanks to Remark 4.1, we can assume that 
block An is square and non-singular. We can build the LU decomposition of 
An = LnUii and consider the following equality 



A 2 iA 22 \ A 2 iUii l I \0 A 22 



where A 22 = A 22 — A 2 i A12 is the Schur complement. Even if 

block A 22 is singular we can compute the LU decomposition A 22 = L 22 U 22 
and the final LU decomposition of matrix A, as follows: 



I 





:) 


2\Uii 









■21 fil 1 


-£-22 


L 




f^ll -£'n 1 Ai2 
3 C>22 

u 



Once we have the LU decomposition of A, we use it to recursively compute 
the LU decomposition of larger blocks containing A, until we reach a global 
LU decomposition for the whole M. From it, the rank determination is trivial, 
because it is enough to count the nonzeros in the diagonal of L. 
The three most expensive steps in the above algorithm are: 

(1) the LU decomposition of block An. 

(2) the LU decomposition of block A 22 . 

(3) the computation of the Schur complement A 22 . 

All three steps cost at most 0(n 3 ), with standard matrix multiplication, how- 
ever the actual constants differ and depend on the matrix structure and spar- 
sity. The cost of matrix multiplication can be lowered theoretically with the 
Strassen method ([Str69]), but our matrices are too small to take advantage 
of it. However, they are large enough to entice the use of the famous four 
Russian algorithm ([Ang76]). We refer to [BRIO] for our exact strategy. 
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Do AES encryptions act randomly? 



4-1 Cost of the attack 

The attack works very well with 128 keys, although 70 are usually enough. 
We draw a very conservative estimate of its cost in this subsection. 

The attacker needs to collect 2 16 pairs encrypted with the same key. Since 
we are requiring 128 keys, it means that the total number of pairs is 

2 16. 2 7 = 2 23 

For any key, the attacker must compute about 2 15 matrix ranks. Any rank 
computation cost^f] 2 26 ecnryptions. Therefore, the total cost of the attack is 

2 26 . 2 15 . 2 7 = 2 48 encryptions 



5 Conclusions 

Reduced-round versions of AES-128, AES-192, AES-256 are known to be 
weak, although none of these attacks come close to the actual number of 
rounds. The best-known attacks use advanced differential cryptanalysis and 
depend heavily on the key- scheduling algorithm. Our distinguishing attack is 
independent of the key-schedule and depends only on the round structure. 
Therefore, it may be successful even if a huge number of rounds is used. 
We strongly invite anyone to try our attack, with any number of rounds, and 
we put our software freely usable at 



http://www.science.unitn.it/" sala/AES/ 



The more statistical evidence we collect, the more confidence we will grow in 
our results. Of course, it is possible that a refinement of our approach might 
lead to a key- recovery algorithm. Yet, we have not been able to see how, since 
the link between the key and the rank statistics is still unclear. 
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